Spatial analysis is a process that begins with exploring and mapping a dataset and can lead to potentially complex models and visualizations of real world features and phenomena. Spatial queries are the building blocks of this analytical process. These queries are software operations that allow us to ask questions of our data and which return data metrics, subsets or new data objects. In this lesson we explore the two basic types of spatial queries: measurement queries and relationship queries.
Instructor Notes
The basic types of spatial queries are:
Both of these types of queries operate on the geometry of features in one or two datasets and are dependent on the type of geometry. For example, with point features you can make distance measurements or ask what points are spatially inside polygon objects. Polygon features, on the other hand, allow for a wider range of both measurement and spatial relationship queries.
An important distinction between these two types of queries is that measurement queries depend on the CRS of the data while spatial relationship queries do not. This is because topological relationships, the term used to describe spatial relationships, are invariant to rotation, translation and scaling transformations like those that CRS transformations entail.
We already know how to do attribute queries with our data. For example, we can select one or more specific counties by name or select those counties where the total population is greater than 100,000 because we have these columns in the dataset.
Spatial queries are special because they are dynamic. For example, we can compute area from the geometry without it already being encoded or we can select BART stations in Berkeley even if city is not encoded in the BART data by linking those two spatial datasets in the same geographic space. This dynamic query capability is extremely powerful!
In this lesson we’ll work through examples of each of those types of queries.
Then we’ll try a very common spatial analysis method that is a conceptual amalgam of those two types: proximity analysis.
Load the libraries we will use.
library(sf)
library(tmap)
library(here)
Read in the CA census tracts data and then take a look at its geometry and attributes.
census_tracts = st_read(here("notebook_data",
"census",
"Tracts",
"cb_2019_06_tract_500k.shp"))
## Reading layer `cb_2019_06_tract_500k' from data source
## `C:\Users\kathe\berkeley_drive\analyses\dlab\Geospatial-Fundamentals-in-R-with-sf\notebook_data\census\Tracts\cb_2019_06_tract_500k.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 8041 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.4096 ymin: 32.53416 xmax: -114.1312 ymax: 42.00948
## Geodetic CRS: NAD83
plot(census_tracts$geometry)
head(census_tracts)
## Simple feature collection with 6 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -122.327 ymin: 33.83556 xmax: -117.9535 ymax: 37.94975
## Geodetic CRS: NAD83
## STATEFP COUNTYFP TRACTCE AFFGEOID GEOID NAME LSAD
## 1 06 013 370000 1400000US06013370000 06013370000 3700 CT
## 2 06 001 442301 1400000US06001442301 06001442301 4423.01 CT
## 3 06 037 405101 1400000US06037405101 06037405101 4051.01 CT
## 4 06 037 199800 1400000US06037199800 06037199800 1998 CT
## 5 06 037 291300 1400000US06037291300 06037291300 2913 CT
## 6 06 037 292000 1400000US06037292000 06037292000 2920 CT
## ALAND AWATER geometry
## 1 999356 0 MULTIPOLYGON (((-122.327 37...
## 2 1429049 0 MULTIPOLYGON (((-121.9701 3...
## 3 1121229 0 MULTIPOLYGON (((-117.9693 3...
## 4 651258 0 MULTIPOLYGON (((-118.2156 3...
## 5 2353751 45836 MULTIPOLYGON (((-118.3091 3...
## 6 4000998 1856 MULTIPOLYGON (((-118.3091 3...
Select just the Alameda County census tracts.
census_tracts_ac = census_tracts[census_tracts$COUNTYFP=='001',]
plot(census_tracts_ac)
We’ll start off with some simple measurement queries.
We can get the area of each of our census tracts using thesf function st_area.
st_area(census_tracts_ac)[1:10]
## Units: [m^2]
## [1] 1433707.1 860583.5 485411.1 337395.8 367477.3 4350755.5 344063.1
## [8] 3077954.4 1029301.4 485466.0
Okay!
We got…
numbers!
…?
Question
Let’s take a look at our CRS.
st_crs(census_tracts_ac)
## Coordinate Reference System:
## User input: NAD83
## wkt:
## GEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4269]]
Wow! We’re working with data that are in what is called an unprojected CRS. That means that the coordinates are latitude and longitude values and the units are decimal degrees. However, the sf::st_area function automatically returned area measurements in square meters (rather than in square degrees, which don’t make sense.)
How did it do this? Well, you can check out the help documentation for ?st_area for more information. If the data have a projected CRS, st_area uses Euclidean geometry to return area measurements in the units of the CRS. For an unprojected CRS, st_area calculates geodetic area on a curved surface model of the Earth and returns measurements in square meters. Pretty cool and pretty useful right?
That said, when doing spatial analysis, we will almost always want to convert all of our data to the same projected CRS since many spatial operations do not work with geographic CRSs.
Time to project! We’ll transform the data to the UTM Zone 10N, NAD83 CRS (EPSG 26910) which is appropriate for Northern California location data and is highly accurate for measurement queries for areas within the zone.
#Transform CRS of census tract data
census_tracts_ac_utm10 = st_transform(census_tracts_ac, 26910)
Now check it..especially look for the units.
st_crs(census_tracts_ac_utm10)
## Coordinate Reference System:
## User input: EPSG:26910
## wkt:
## PROJCRS["NAD83 / UTM zone 10N",
## BASEGEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4269]],
## CONVERSION["UTM zone 10N",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",-123,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",0.9996,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",500000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["Engineering survey, topographic mapping."],
## AREA["North America - between 126°W and 120°W - onshore and offshore. Canada - British Columbia; Northwest Territories; Yukon. United States (USA) - California; Oregon; Washington."],
## BBOX[30.54,-126,81.8,-119.99]],
## ID["EPSG",26910]]
Now let’s try our area calculation again.
st_area(census_tracts_ac_utm10)[1:10]
## Units: [m^2]
## [1] 1433564.7 860479.2 485354.4 337347.3 367423.6 4350606.4 344013.5
## [8] 3077827.6 1029173.7 485399.5
What if we compare areas calculated with our unprojected and projected CRS data?
# Using format to make for easier to read display
print(format(st_area(census_tracts_ac)[[1]], big.mark = ','))
## [1] "1,433,707 [m^2]"
print(format(st_area(census_tracts_ac_utm10)[[1]], big.mark = ","))
## [1] "1,433,565 [m^2]"
Hmmm… The numbers are a bit different…specifically…
format((st_area(census_tracts_ac)[[1]] - st_area(census_tracts_ac_utm10)[[1]]),
digits = 0,
big.mark = ',')
## [1] "142 [m^2]"
You may have noticed that our census tracts already have an area column in them.
Let’s also compare the calculated areas with the data in this column.
print(st_area(census_tracts_ac)[[1]])
## 1433707 [m^2]
print(st_area(census_tracts_ac_utm10)[[1]])
## 1433565 [m^2]
print(census_tracts$ALAND[1])
## [1] 999356
Question
What explains the discrepancy? Which areas are correct? Which are incorrect?
We can also calculate the area for Alameda county summing the areas of all census tracts.
sum(st_area(census_tracts_ac_utm10))
## 1943203551 [m^2]
We can look up how large Alameda County is to check our work.The county is 739 miles2, which is around 1,914,001,213 meters2. I’d say we’re pretty close!
# Sum the area of all Alameda county Census Tracts dynamically in square miles...
sum(units::set_units(st_area(census_tracts_ac_utm10), mi^2))
## 750.2751 [mi^2]
Calculating the area of all features and adding the output to the dataframe is a useful operation because it allows us to convert count variables like population to densities.
# Add the area of all Alameda County Census tracts to the data frame
census_tracts_ac_utm10$area_sqmi <- units::set_units(st_area(census_tracts_ac_utm10), mi^2)
# Check it by summing
print(sum(census_tracts_ac_utm10$area_sqmi))
## 750.2751 [mi^2]
# Take a look
head(census_tracts_ac_utm10,3)
## Simple feature collection with 3 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 579937.1 ymin: 4153335 xmax: 592636.3 ymax: 4167206
## Projected CRS: NAD83 / UTM zone 10N
## STATEFP COUNTYFP TRACTCE AFFGEOID GEOID NAME LSAD
## 2 06 001 442301 1400000US06001442301 06001442301 4423.01 CT
## 73 06 001 437400 1400000US06001437400 06001437400 4374 CT
## 114 06 001 437701 1400000US06001437701 06001437701 4377.01 CT
## ALAND AWATER geometry area_sqmi
## 2 1429049 0 MULTIPOLYGON (((590997 4154... 0.5535024 [mi^2]
## 73 858407 0 MULTIPOLYGON (((580052.1 41... 0.3322329 [mi^2]
## 114 484105 0 MULTIPOLYGON (((581636.7 41... 0.1873964 [mi^2]
You may be wondering how R is managing the units of our measurements.
It turns out that sf depends on the units package to track units.
This is super convenient! But there is a gotcha:
# convert to square kilometers
sum(st_area(census_tracts_ac_utm10)) / (1000^2)
## 1943.204 [m^2]
Oops! Our manual conversion to square kilometers gave us the right number but kept the now-wrong units!
Here’s the proper way to convert:
units::set_units(sum(st_area(census_tracts_ac_utm10)), km^2)
## 1943.204 [km^2]
Much nicer! In case you’re wondering how we knew the right abbreviation to use for kilometers, check out the leftmost column in this reference table:
# View(units::valid_udunits())
We can use the st_length operator in the same way to calculate the length or perimeter of features. Always take note of the output units!
st_length(census_tracts)[1:10]
## Units: [m]
## [1] 0 0 0 0 0 0 0 0 0 0
The st_distance function can be used to find the pairwise distance between two sets of geometries.
st_distance(census_tracts_ac_utm10, census_tracts_ac_utm10)[1:5,1:5]
## Units: [m]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.00 15426.9778 14079.4434 41190.08 40789.89
## [2,] 15426.98 0.0000 686.6722 24396.80 24010.87
## [3,] 14079.44 686.6722 0.0000 26033.64 25670.59
## [4,] 41190.08 24396.8018 26033.6355 0.00 0.00
## [5,] 40789.89 24010.8725 25670.5915 0.00 0.00
You can also use it to find the distance between specific features.
# Identify my tracts of interest
mytracts = c('4201','4202','4203','4204')
# What is the distance between tract 4201 and all other tracts
st_distance(census_tracts_ac_utm10[census_tracts_ac_utm10$NAME=='4101',],
census_tracts_ac_utm10[census_tracts_ac_utm10$NAME %in% mytracts,] )
## Units: [m]
## [,1] [,2] [,3] [,4]
## [1,] 19220.14 18822.72 19263.81 18946.69
Spatial relationship queries consider how two geometries or sets of geometries relate to one another in space. For example, you may want to know what schools are located within the City of Berkeley or what East Bay Regional Parks have land within Berkeley. You may also want to combine a measurement query with a spatial relationship query. Example, you may want to know the total length of freeways within the city of Berkeley.
Here is a list of some of the more commonly used sf spatial relationship operations.
These can be used to select features in one dataset based on their spatial relationship to another. In other works, you can use these operations to make spatial selections / create spatial subsets.
Enough talk. Let’s work through some examples.
First, load the CA Places data and select the city of Berkeley and save it to a sf dataframe.
places = st_read(here("notebook_data",
"census",
"Places",
"cb_2019_06_place_500k.shp"))
## Reading layer `cb_2019_06_place_500k' from data source
## `C:\Users\kathe\berkeley_drive\analyses\dlab\Geospatial-Fundamentals-in-R-with-sf\notebook_data\census\Places\cb_2019_06_place_500k.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 1521 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.2694 ymin: 32.53416 xmax: -114.229 ymax: 41.99314
## Geodetic CRS: NAD83
berkeley = places[places$NAME=='Berkeley',]
plot(berkeley$geometry)
Then, load the Alameda County schools data and make it a spatial dataframe.
schools_df = read.csv(here("notebook_data",
"alco_schools.csv"))
schools_sf = st_as_sf(schools_df,
coords = c('X','Y'),
crs = 4326)
Check that the two sf dataframes have the same CRS.
st_crs(schools_sf) == st_crs(berkeley)
## [1] FALSE
They don’t have the same CRS so we need to align them. Let’s transform (or reproject) the CRS of both of these dataframes to UTM10N, NAD83 (EPSG 26910). This is a commonly used CRS for Northern CA data.
# Transform data CRSes
schools_utm10 <- st_transform(schools_sf, 26910)
berkeley_utm10 <- st_transform(berkeley, 26910)
If you look at the Schools data you will see that it has a City column. So we can subset the data by attribute to select the Schools in Berkeley. No need to do a spatial selection.
berkeley_schools = schools_utm10[schools_utm10$City == 'Berkeley', ]
dim(berkeley_schools)
## [1] 31 8
Confirm the results by plotting the data
plot(berkeley_utm10$geometry)
plot(berkeley_schools$geometry, add = T)
That looks good and was a relatively simple operation. But what if the schools data didn’t have that city column or if only some of the rows had a value in that column. How can we identify the schools in Berkeley spatially?
Here’s how!
# SPATIALLY select only the schools within Berkeley
berkeley_schools_spatial = schools_utm10[berkeley_utm10, , op = st_intersects] # NO QUOTES!!!
Yes that was it! Take a long look at that simple yet powerful spatial selection syntax.
You should interpret that syntax as:
"Select all features (i.e. rows) in the schools_utm10 dataframe:
and all of the columns:
(all because the extraction brackets have no second argument)
whose geometry spatially intersects the Berkeley_utm10 geometry
The op=st_intersects argument is optional because st_intersects is the default spatial selector.
To emphasize this, let’s rerun the last command.
# SPATIALLY select only the schools within Berkeley
berkeley_schools_spatial = schools_utm10[berkeley_utm10, ]
spatiallly intersects mean?Here’s one way to explain it.
Geometry A spatially intersects Geometry B if any of its parts (e.g., a point, line segment, or polygon) is equivalent to, touches, crosses, is contained by, contains, or overlaps any part of Geometry B.
So st_intersects is the mother of all spatial relationships! It is the most general and the most useful. However, you can specify any of those more specific spatial relationships by setting op= to any of the options listed in the ?st_intersects? help documentation.
Let’s check out the sf object that our selection returned.
# How many schools did we get
dim(berkeley_schools_spatial)
## [1] 32 8
# Map the results
plot(berkeley_utm10$geometry)
plot(berkeley_schools_spatial$geometry, add = T)
Interestingly, we have one more school in Berkeley based on the spatial selection!? Let’s take a look.
plot(berkeley_utm10$geometry)
plot(berkeley_schools_spatial$geometry, add = T)
plot(berkeley_schools$geometry, col="red", add = T)
Let’s use an interactive tmap to zoom into the school that was not selected by attribute but was spatially selected.
tmap_mode('view')
## tmap mode set to interactive viewing
tm_shape(berkeley_utm10) +
tm_borders() +
tm_shape(berkeley_schools_spatial) +
tm_dots(col = "black", size = 0.3) +
tm_shape(berkeley_schools) +
tm_dots(col = "red", size = 0.1)
IMPORTANT: The default spatial selection operator is
st_intersects. If you want to use any other operator, it must be specified.
For example, we can use the st_disjoint operator to select only those schools NOT in Berkeley.
# Select all Alameda County Schools NOT in Berkeley with the disjoint operator
berkeley_schools_disjoint = schools_utm10[berkeley_utm10, , op = st_disjoint]
# Plot the result
plot(berkeley_schools_disjoint$geometry)
plot(berkeley_utm10,
col = NA,
border = "red",
add = T)
## Warning in plot.sf(berkeley_utm10, col = NA, border = "red", add = T): ignoring
## all but the first attribute
There is no need to memorize these spatial operators (aka predicates)! Here is a fantastic sf cheatsheet that lists and briefly explains all these common functions (and many more).
Let’s load a new dataset, the CA Protected Areas Database (CPAD), to demonstrate these spatial queries in more detail.
This dataset contains all of the protected areas (parks and the like) in California.
We will use this data and the Alameda County census tract data that we created earlier to ask, “What protected areas are within Alameda County?”
First load the CPAD data.
cpad <- st_read(here("notebook_data",
"protected_areas",
"CPAD_2020a_Units.shp"))
## Reading layer `CPAD_2020a_Units' from data source
## `C:\Users\kathe\berkeley_drive\analyses\dlab\Geospatial-Fundamentals-in-R-with-sf\notebook_data\protected_areas\CPAD_2020a_Units.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 17068 features and 21 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -374984.2 ymin: -604454.8 xmax: 540016.3 ymax: 449743.2
## Projected CRS: NAD83 / California Albers
What is the CRS of the CPAD data?
Let’s transform the data to match census_tracts_ac_utm10.
cpad_utm10 <- st_transform(cpad, st_crs(census_tracts_ac_utm10))
Let’s plot the data in so that we know what to expect. CPAD is big, so wait for it…
plot(census_tracts_ac_utm10$geometry, col = 'grey', border = "grey")
plot(cpad_utm10$geometry, col = 'green', add = T)
We can see from our map that some of the protected areas are completely within Alameda County, some of them overlap, and some are completely outside of the county. To get both of the “inside” and “overlaps” cases we use the st_intersects spatial selection operator, which is the default. Let’s check it out.
cpad_intersects <- cpad_utm10[census_tracts_ac_utm10,] # st_intersects
cpad_within <- cpad_utm10[census_tracts_ac_utm10, , op = st_within] # st_within
We can use tmap to explore the difference in the results from st_intersects vs st_within
tmap_mode('view')
## tmap mode set to interactive viewing
tm_shape(census_tracts_ac_utm10) +
tm_polygons(col = "gray",
border.col = "grey") +
tm_shape(cpad_intersects) +
tm_borders(col = "green") +
tm_shape(cpad_within) +
tm_borders(col = 'red')
What you can see from the above, is that by default, st_intersects returns the features that intersect but it does not clip the features to the boundary of Alameda County. For that, we would need to use a different spatial operation - st_intersection.
Let’s try it!
cpad_in_ac <- st_intersection(cpad_utm10, census_tracts_ac_utm10)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
Great! Now, if we scroll the resulting sf object we’ll see that the COUNTY column of our resulting subset gives us a good sanity check on our results. Or does it?
table(cpad_in_ac$COUNTY)
##
## Alameda Contra Costa San Joaquin Santa Clara
## 837 20 2 4
Always check your output - both the attribute table & the geometry!
head(cpad_in_ac)
## Simple feature collection with 6 features and 31 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 562862.8 ymin: 4166288 xmax: 598493.1 ymax: 4184898
## Projected CRS: NAD83 / UTM zone 10N
## ACCESS_TYP UNIT_ID UNIT_NAME SUID_NMA AGNCY_ID
## 8992 Open Access 14846 Schafer Park 25179 2045
## 9491 Open Access 15886 Eden Greenway 18427 2045
## 2479 Open Access 17473 Lowell Park 21557 1228
## 9186 Open Access 15251 Owens Plaza Park 23276 1257
## 9194 Open Access 15260 Creekside Park 17777 1257
## 1778 Open Access 13264 Jefferson Square 20383 1228
## AGNCY_NAME AGNCY_LEV
## 8992 Hayward Area Recreation and Park District Special District
## 9491 Hayward Area Recreation and Park District Special District
## 2479 Oakland, City of City
## 9186 Pleasanton, City of City
## 9194 Pleasanton, City of City
## 1778 Oakland, City of City
## AGNCY_TYP
## 8992 Recreation/Parks District
## 9491 Recreation/Parks District
## 2479 City Agency
## 9186 City Agency
## 9194 City Agency
## 1778 City Agency
## AGNCY_WEB LAYER
## 8992 http://www.haywardrec.org/ Special District
## 9491 http://www.haywardrec.org/ Special District
## 2479 http://www2.oaklandnet.com/Government/o/opr/index.htm City
## 9186 http://www.cityofpleasantonca.gov/ City
## 9194 http://www.cityofpleasantonca.gov/ City
## 1778 http://www2.oaklandnet.com/Government/o/opr/index.htm City
## MNG_AG_ID MNG_AGENCY MNG_AG_LEV
## 8992 2045 Hayward Area Recreation and Park District Special District
## 9491 2045 Hayward Area Recreation and Park District Special District
## 2479 1228 Oakland, City of City
## 9186 1257 Pleasanton, City of City
## 9194 1257 Pleasanton, City of City
## 1778 1228 Oakland, City of City
## MNG_AG_TYP PARK_URL COUNTY ACRES LABEL_NAME YR_EST
## 8992 Recreation/Parks District <NA> Alameda 1.748 Schafer Park 0
## 9491 Recreation/Parks District <NA> Alameda 54.736 Eden Greenway 1970
## 2479 City Agency <NA> Alameda 8.710 Lowell Park 0
## 9186 City Agency <NA> Alameda 2.554 Owens Plaza Park 0
## 9194 City Agency <NA> Alameda 5.719 Creekside Park 2000
## 1778 City Agency <NA> Alameda 1.376 Jefferson Square 0
## DES_TP GAP_STS STATEFP COUNTYFP TRACTCE
## 8992 Local Park 4 06 001 437400
## 9491 Local Recreation Area 4 06 001 437400
## 2479 Local Park 4 06 001 402500
## 9186 Local Park 4 06 001 450743
## 9194 Local Park 4 06 001 450743
## 1778 Local Park 4 06 001 403100
## AFFGEOID GEOID NAME LSAD ALAND AWATER
## 8992 1400000US06001437400 06001437400 4374 CT 858407 0
## 9491 1400000US06001437400 06001437400 4374 CT 858407 0
## 2479 1400000US06001402500 06001402500 4025 CT 365454 0
## 9186 1400000US06001450743 06001450743 4507.43 CT 4333270 0
## 9194 1400000US06001450743 06001450743 4507.43 CT 4333270 0
## 1778 1400000US06001403100 06001403100 4031 CT 346852 0
## area_sqmi geometry
## 8992 0.3322329 [mi^2] POLYGON ((580615.7 4166758,...
## 9491 0.3322329 [mi^2] POLYGON ((580206.2 4166317,...
## 2479 0.1418630 [mi^2] POLYGON ((562897.3 4184898,...
## 9186 1.6797785 [mi^2] POLYGON ((598045.1 4172639,...
## 9194 1.6797785 [mi^2] POLYGON ((598289.7 4172276,...
## 1778 0.1328244 [mi^2] POLYGON ((563481.4 4184006,...
Let’s also use an overlay plot to check the output geometry.
tm_shape(census_tracts_ac_utm10) +
tm_polygons(col = 'gray',
border.col = 'gray') +
tm_shape(cpad_in_ac) +
tm_polygons(col = 'ACRES',
palette = 'YlGn',
border.col = 'black',
lwd = 0.4,
alpha = 0.8,
title = 'Protected areas in Alameda County, colored by area')
It really depends! But make sure you understand the difference.
st_intersects is a logical operator that returns True if two geometries intersect in any way. When used to subset (or filter) a spatial dataframe, st_intersects returns those features in the dataframe that intersect with the filter dataframe.
On the other hand, st_intersection returns a new spatial dataframe that set intersection of the two dataframes, including both the geometries and attributes of the intersecting features. Use st_intersection with caution and always check your results.
It’s your turn.
Write a spatial relationship query to create a new dataset containing only the BART stations in Berkeley.
Run the next two cells to (1) load the dataset containing Berkeley BART stations and then reproject it to the same CRS as that used by the Berkeley_utm10 dataframe (EPSG: 26910)’ (2) plot these two datasets in an overlay map.
Then, write your own code to: 1. Spatially select the BART stations that are within Berkeley 2. Plot the Berkeley boundary and then overlay the selected BART stations.
# load the Berkeley boundary
bart_df = st_read(here("notebook_data",
"transportation",
"bart.csv"))
## Reading layer `bart' from data source
## `C:\Users\kathe\berkeley_drive\analyses\dlab\Geospatial-Fundamentals-in-R-with-sf\notebook_data\transportation\bart.csv'
## using driver `CSV'
## Warning: no simple feature geometries present: returning a data.frame or tbl_df
bart_sf = st_as_sf(bart_df,
coords = c('lon', 'lat'),
crs = 4326)
# transform to EPSG:26910
bart_utm10 = st_transform(bart_sf, st_crs(berkeley_utm10))
# display
head(berkeley_utm10)
## Simple feature collection with 1 feature and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 559347.6 ymin: 4188961 xmax: 567371.4 ymax: 4195617
## Projected CRS: NAD83 / UTM zone 10N
## STATEFP PLACEFP PLACENS AFFGEOID GEOID NAME LSAD ALAND
## 740 06 06000 02409837 1600000US0606000 0606000 Berkeley 25 27127434
## AWATER geometry
## 740 18715615 MULTIPOLYGON (((559347.6 41...
Plot the data together
plot(bart_utm10$geometry)
plot(berkeley_utm10$geometry, border = 'blue', add = T)
Now, in the cell below, write the code to spatially select the Berkeley BART stations, then make the map.
# YOUR CODE HERE:
# Spatially select the BART stations within Berkeley
# Plot the Bart stations in Berkeley overlaid on top of the Berkeley City Boundary
Now that we’ve seen the basic idea of spatial measurement and spatial relationship queries, let’s take a look at a common analysis that combines those concepts: promximity analysis.
Proximity analysis seeks to identify near features - or, in other words, all features in a focal feature set that are within some maximum distance of features in a reference feature set.
A very common workflow for this analysis is:
Buffer around the features in the reference dataset to create buffer polygons.
Run a spatial relationship query to find all focal features that intersect (or are within) the buffer polygons.
Let’s read in our bike boulevard data again.
Then we’ll find out which of our Berkeley schools are within a block’s distance (200 meters) of the bike boulevards.
bike_blvds <- st_read(here("notebook_data",
"transportation",
"BerkeleyBikeBlvds.geojson"))
## Reading layer `BerkeleyBikeBlvds' from data source
## `C:\Users\kathe\berkeley_drive\analyses\dlab\Geospatial-Fundamentals-in-R-with-sf\notebook_data\transportation\BerkeleyBikeBlvds.geojson'
## using driver `GeoJSON'
## Simple feature collection with 211 features and 10 fields
## Geometry type: MULTILINESTRING
## Dimension: XY
## Bounding box: xmin: 561541.2 ymin: 4189007 xmax: 566451.6 ymax: 4193483
## Projected CRS: WGS 84 / UTM zone 10N
plot(bike_blvds$geometry)
Of course, we need to reproject the boulevards to our projected CRS.
bike_blvds_utm10 = st_transform(bike_blvds, st_crs(berkeley_utm10))
Now we can create our 200 meter bike boulevard buffers.
bike_blvds_buf = st_buffer(bike_blvds_utm10, dist = 200)
Now let’s overlay everything.
tm_shape(berkeley_utm10) +
tm_polygons(col = 'lightgrey') +
tm_shape(bike_blvds_buf) +
tm_polygons(col = 'pink', alpha = 0.5) +
tm_shape(bike_blvds_utm10) +
tm_lines() +
tm_shape(berkeley_schools_spatial) +
tm_dots(col = 'purple', size = 0.2)
Great! Looks like we’re all ready to run our spatial relationship query to complete the proximity analysis. At this point (pun intended) select the schools that are in within the bike boulevard buffer polygons.
schools_near_blvds = berkeley_schools_spatial[bike_blvds_buf, ]
# or
#schools_near_blvds = berkeley_schools_spatial[bike_blvds_buf, , op = 'st_within']
Now let’s overlay again, to see if the schools we selected make sense.
tm_shape(berkeley_utm10) +
tm_polygons(col = 'lightgrey') +
# add the bike blvd buffer polygons
tm_shape(bike_blvds_buf) +
tm_polygons(col = 'pink', alpha = 0.5) +
# add the bike blvd line features
tm_shape(bike_blvds_utm10) +
tm_lines() +
# add all berkeley schools
tm_shape(berkeley_schools_spatial) +
tm_dots(col = 'purple', size = 0.2) +
# add schools near bike boulevards in yellow
tm_shape(schools_near_blvds) +
tm_dots(col = 'yellow', size = 0.2)
You can use st_distance and its companion function st_nearest_feature to compute the distance between each feature and the nearest bike boulevard. The st_nearest_feature function returns the ID of the closest feature.
# Identify the nearest bike boulevard for each school
nearest = st_nearest_feature(berkeley_schools_spatial, bike_blvds_utm10)
# take a look!
nearest
## [1] 71 171 39 136 42 30 163 172 127 171 189 156 137 33 187 197 50 207 169
## [20] 102 125 137 3 109 207 76 135 173 56 102 146 76
Then we can calculate the distance between each school and it’s nearest bike boulevard.
st_distance(berkeley_schools_spatial,
bike_blvds_utm10[nearest,],
by_element = TRUE)
## Units: [m]
## [1] 13.848161 985.459488 309.889446 369.402946 196.011379 15.332395
## [7] 27.250406 439.758905 107.902846 926.216792 193.030072 181.836256
## [13] 373.736477 215.903128 184.321307 186.446907 1.288383 94.064527
## [19] 211.477566 218.613628 186.913116 230.212129 15.162313 188.829602
## [25] 232.764113 224.700672 173.920971 15.892361 514.765384 92.556921
## [31] 93.426741 128.131187
Now it’s your turn to try out a proximity analysis!
Write your own code to find all BERKELEY schools within walking distance (1 km) of a BART station.
As a reminder, let’s break this into steps:
bart_utm10 dataframeTo see the solution, look at the hidden text below.
# YOUR CODE HERE:
# spatially select the Berkeley BART stations
# # (You may have done this in a previous exercise.)
# buffer the BART stations to 1 km
# spatially select the schools within the buffers
# map your results with tmap
# # plot the Berkeley boundary (for reference) in lightgrey
# add the BART stations (for reference) to the plot in green
# add the BART buffers (for check) in lightgreen
# add all Berkeley schools (for reference) in black
# add the schools near BART (for check) in yellow
Compute the distance between each Berkeley school and its nearest BART station!
# YOUR CODE HERE:
Leveraging what we’ve learned in our earlier lessons, we got to work with map overlays and start answering questions related to proximity. Key concepts include:
st_area,st_lengthst_distancest_intersects,st_intersectionst_within, etc.st_buffer